Clustering US Senators
Posted on Dim 23 septembre 2018 in Data Analysis
Clustering US Senators¶
The goal is to cluster Senators to find those how follow the mainstream of their parties, and those who doesn't.
The dataset represent the Senate votes on proposed legislation from the 114th Senate. Each row represent a Senator and each column represent a vote. A 0 in a cell means the Senator voted No on the bill, 1 means the Senator voted Yes, and 0.5 means the Senator abstained.
import pandas as pd
votes = pd.read_csv("114_congress.csv")
votes.head()
Number of Senators by Party, Mean by roll¶
print(votes['party'].value_counts())
print(votes.mean())
Clustering¶
import pandas as pd
from sklearn.cluster import KMeans
kmeans_model = KMeans(n_clusters=2, random_state=1)
senator_distances = kmeans_model.fit_transform(votes.iloc[:,3:])
#distance from each cluster
Count how many Senators from each party ended up in each cluster.¶
labels = kmeans_model.labels_
print(labels)
pd.crosstab(labels , votes["party"])
The first cluster contains 41 Democrats, and both Independents. The second cluster contains 3 Democrats, and 54 Republicans.
It sounds like 3 Democrats are more similar to Republicans in their voting than their own party. We'll explore these 3 Senators in more depth.
democratic_outliers = votes[(labels == 1) & (votes["party"] == "D")]
print(democratic_outliers)
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(senator_distances[:,0], senator_distances[:,1], c = labels)
plt.show()
Distance from the cluster center
Let's find extrems Senators¶
We cube every distance between a Senator point from the cluster center for identifying extrems
extremism = (senator_distances ** 3).sum(axis=1)
votes['extremism'] = extremism
votes.sort_values("extremism", inplace=True, ascending=False)
print(votes.head(10))